Adaptive and multiple time-scale eligibility traces for online deep reinforcement learning

نویسندگان

چکیده

Deep reinforcement learning (DRL) is one promising approach to teaching robots perform complex tasks. Because methods that directly reuse the stored experience data cannot follow change of environment in robotic problems with a time-varying environment, online DRL required. The eligibility traces method well known as an technique for improving sample efficiency traditional linear regressors rather than DRL. dependency between parameters deep neural networks would destroy traces, which why they are not integrated Although replacing gradient most influential accumulating gradients can alleviate this problem, operation reduces number reuses previous experiences. To address these issues, study proposes new be used even while maintaining high efficiency. When accumulated differ from those computed using latest parameters, proposed takes into account divergence past and adaptively decay traces. Bregman divergences outputs by exploited due infeasible computational cost parameters. In addition, generalized multiple time-scale designed first time. This design allows replacement (decayed)

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Role of Multiple Neuromodulators in Reinforcement Learning That Is Based on Competition between Eligibility Traces

The ability to maximize reward and avoid punishment is essential for animal survival. Reinforcement learning (RL) refers to the algorithms used by biological or artificial systems to learn how to maximize reward or avoid negative outcomes based on past experiences. While RL is also important in machine learning, the types of mechanistic constraints encountered by biological machinery might be d...

متن کامل

A Unified Approach for Multi-step Temporal-Difference Learning with Eligibility Traces in Reinforcement Learning

Recently, a new multi-step temporal learning algorithm, called Q(σ), unifies n-step Tree-Backup (when σ = 0) and n-step Sarsa (when σ = 1) by introducing a sampling parameter σ. However, similar to other multi-step temporal-difference learning algorithms, Q(σ) needs much memory consumption and computation time. Eligibility trace is an important mechanism to transform the off-line updates into e...

متن کامل

Evidence for eligibility traces in human learning

Whether we prepare a coffee or navigate to a shop: in many tasks we make multiple decisions before reaching a goal. Learning such state-action sequences from sparse reward raises the problem of creditassignment: which actions out of a long sequence should be reinforced? One solution provided by reinforcement learning (RL) theory is the eligibility trace (ET); a decaying memory of the state-acti...

متن کامل

Eligibility Propagation to Speed up Time Hopping for Reinforcement Learning

A mechanism called Eligibility Propagation is proposed to speed up the Time Hopping technique used for faster Reinforcement Learning in simulations. Eligibility Propagation provides for Time Hopping similar abilities to what eligibility traces provide for conventional Reinforcement Learning. It propagates values from one state to all of its temporal predecessors using a state transitions graph....

متن کامل

An Adaptive Learning Game for Autistic Children using Reinforcement Learning and Fuzzy Logic

This paper, presents an adapted serious game for rating social ability in children with autism spectrum disorder (ASD). The required measurements are obtained by challenges of the proposed serious game. The proposed serious game uses reinforcement learning concepts for being adaptive. It is based on fuzzy logic to evaluate the social ability level of the children with ASD. The game adapts itsel...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Robotics and Autonomous Systems

سال: 2022

ISSN: ['0921-8890', '1872-793X']

DOI: https://doi.org/10.1016/j.robot.2021.104019